Enumerating all maximal biclusters in numerical datasets

نویسندگان

Rosana Veroneze

Arindam Banerjee

Fernando José Von Zuben

چکیده

Biclustering has proved to be a powerful data analysis technique due to its wide success in various application domains. However, the existing literature presents efficient solutions only for enumerating maximal biclusters with constant values, or heuristic-based approaches which can not find all biclusters or even support the maximality of the obtained biclusters. Here, we present a general family of biclustering algorithms for enumerating all maximal biclusters with (i) constant values on rows, (ii) constant values on columns, or (iii) coherent values. Versions for perfect and for perturbed biclusters are provided. Our algorithms have four key properties (just the algorithm for perturbed biclusters with coherent values fails to exhibit the first property): they are (1) efficient (take polynomial time per pattern), (2) complete (find all maximal biclusters), (3) correct (all biclusters attend the userdefined measure of similarity), and (4) non-redundant (all the obtained biclusters are maximal and the same bicluster is not enumerated twice). They are based on a generalization of an efficient formal concept analysis algorithm called In-Close2. Experimental results point to the necessity of having efficient enumerative biclustering algorithms and provide a valuable insight into the scalability of our family of algorithms and its sensitivity to user-defined parameters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient mining of maximal biclusters in mixed-attribute datasets

This paper presents a novel enumerative biclustering algorithm to directly mine all maximal biclusters in mixed-attribute datasets, with or without missing values. The independent attributes are mixed or heterogeneous, in the sense that both numerical (real or integer values) and categorical (ordinal or nominal values) attribute types may appear together in the same dataset. The proposal is an ...

متن کامل

Identification of Bicluster Regions in a Binary Matrix and Its Applications

Biclustering has emerged as an important approach to the analysis of large-scale datasets. A biclustering technique identifies a subset of rows that exhibit similar patterns on a subset of columns in a data matrix. Many biclustering methods have been proposed, and most, if not all, algorithms are developed to detect regions of "coherence" patterns. These methods perform unsatisfactorily if the ...

متن کامل

FABIA: factor analysis for bicluster acquisition

MOTIVATION Biclustering of transcriptomic data groups genes and samples simultaneously. It is emerging as a standard tool for extracting knowledge from gene expression measurements. We propose a novel generative approach for biclustering called 'FABIA: Factor Analysis for Bicluster Acquisition'. FABIA is based on a multiplicative model, which accounts for linear dependencies between gene expres...

متن کامل

Efficient Mining Differential Co-Expression Constant Row Bicluster in Real-Valued Gene Expression Datasets

Biclustering aims to mine a number of co-expressed genes under a set of experimental conditions in gene expression dataset. Recently, differential co-expression biclustering approach has been used to identify class-specific biclusters between two gene expression datasets. However, it cannot handle differential co-expression constant row biclusters efficiently in real-valued datasets. In this pa...

متن کامل

On Bicluster Aggregation and its Benefits for Enumerative Solutions

Biclustering involves the simultaneous clustering of objects and their attributes, thus defining local two-way clustering models. Recently, efficient algorithms were conceived to enumerate all biclusters in real-valued datasets. In this case, the solution composes a complete set of maximal and non-redundant biclusters. However, the ability to enumerate biclusters revealed a challenging scenario...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Inf. Sci.

دوره 379 شماره

صفحات -

تاریخ انتشار 2017

Enumerating all maximal biclusters in numerical datasets

نویسندگان

چکیده

منابع مشابه

Efficient mining of maximal biclusters in mixed-attribute datasets

Identification of Bicluster Regions in a Binary Matrix and Its Applications

FABIA: factor analysis for bicluster acquisition

Efficient Mining Differential Co-Expression Constant Row Bicluster in Real-Valued Gene Expression Datasets

On Bicluster Aggregation and its Benefits for Enumerative Solutions

عنوان ژورنال:

اشتراک گذاری